Near-Neighbor Query Performance in Search Trees
نویسندگان
چکیده
Near-neighbor search is an increasingly important operation for queries over multimedia, text, and other non-standard datatypes. In large databases, near-neighbor searches must be enhanced by indexed retrieval for e ciency. In this paper, we present a detailed analysis of three proposals for near-neighbor search: one based on the R-tree, and two which motivated the invention of new trees, namely the SS-tree and SR-tree. We nd that while the new trees do improve performance, the reason for their improvement comes mostly from a new Penalty metric, and not from a variety of other details in their implementation. Our analysis was done using a Generalized Search Tree, which both allowed us to easily do a fair comparison, and also provided the framework for a clearer analysis of the issues at hand.
منابع مشابه
Probabilistic cost model for nearest neighbor search in image retrieval
We present a probabilistic cost model to analyze the performance of the kd-tree for nearest neighbor search in the context of content-based image retrieval. Our cost model measures the expected number of kd-tree nodes traversed during the search query. We show that our cost model has high correlations with both the observed number of traversed nodes and the runtime performance of search queries...
متن کاملWhich Space Partitioning Tree to Use for Search?
We consider the task of nearest-neighbor search with the class of binary-spacepartitioning trees, which includes kd-trees, principal axis trees and random projection trees, and try to rigorously answer the question “which tree to use for nearestneighbor search?” To this end, we present the theoretical results which imply that trees with better vector quantization performance have better search ...
متن کاملDistance Based Indexing for String Proximity Search
In many database applications involving string data, it is common to have near neighbor queries (asking for strings that are similar to a query string) or nearest neighbor queries (asking for strings that are most similar to a query string). The similarity between strings is defined in terms of a distance function determined by the application domain. The most popular string distance measures a...
متن کاملNearest Neighbor Search using Kd-trees
We suggest a simple modification to the kd-tree search algorithm for nearest neighbor search resulting in an improved performance. The Kd-tree data structure seems to work well in finding nearest neighbors in low dimensions but its performance degrades even if the number of dimensions increases to more than three. Since the exact nearest neighbor search problem suffers from the curse of dimensi...
متن کاملOrthogonal Range Searching in Moderate Dimensions: k-d Trees and Range Trees Strike Back
We revisit the orthogonal range searching problem and the exact `∞ nearest neighbor searching problem for a static set of n points when the dimension d is moderately large. We give the first data structure with near linear space that achieves truly sublinear query time when the dimension is any constant multiple of log n. Specifically, the preprocessing time and space are O(n) for any constant ...
متن کامل